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IDENTIFICATION  OF  A  ERODYNAMIC  COEFFICIENTS 
USING  COMPUTATIONAL  NEURAL  NETWORKS 


Dennis  J.  Linse*  and  Robert  F.  StengeF 
Department  of  Mechanical  and  Aerospace  Engineering 
Princeton  University,  Princeton,  NJ  08544 


Abstract 

Precise,  smooth  aerodynamic  models  are  required  for 
implementing  adaptive,  nonlinear  control  strategies. 
Accurate  representations  of  aerod3'namic  coefficients 
can  be  generated  for  the  complete  flight  envelope  by 
combining  computational  neural  network  models  with 
an  Estimation-Before-Modeling  paradigm  for  on-line 
training  information.  A  novel  method  of  incorporating 
first-partial-derivative  information  is  employed  to  esti¬ 
mate  the  weights  in  individual  feedforward  neural  net¬ 
works  for  each  aerodynamic  coefficient.  The  method 
is  demonstrated  by  generating  a  model  of  the  normal 
force  coefficient  of  a  twin-jet  transport  aircraft  from 
simulated  flight  data,  and  promising  results  are  ob¬ 
tained. 

Introduction 

Modern  nonlinear  control  techniques  offer  exciting  po¬ 
tential  for  aircraft  control  [1-3].  These  techniques  re¬ 
quire  comprehensive  aerodynamic  models  that  must 
be  differentiable  with  respect  to  the  state  and  control. 
Given  the  continuous  state  and  output  equations, 

X  =  f(x)-fg(x)u  (1) 

y  =  h(x)  (2) 

a  Nonlinear-Inverse-Dynamic  (NID)  control  law  [2]  is 
developed  by  repeated  differentiation  of  eq.  2  and  sub¬ 
stitution  of  eq.  1  until  the  control  appears  in  each  el¬ 
ement  of  y  or  its  derivatives.  The  global  represen¬ 
tation  of  the  aerodynamic  model  contained  in  f(  ), 
g(-),  and  h(  )  must  be  sufficiently  smooth  to  calcu¬ 
late  such  a  control  law.  Many  system  identification 
methods  used  for  aircraft  fail  to  pro>nde  the  globally 
smooth  modek  needed  by  these  control  laws.  For  ex¬ 
ample,  the  Estimation-Before-Modeling  (EBM)  tech¬ 
nique  [4,  5]  models  the  aerodynamic  coefficients  us¬ 
ing  multiple  regression  schemes  on  partitions  of  the 
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state  and  control  space.  While  the  partitions  span  the 
space,  these  global  models  are,  in  general,  not  contin¬ 
uous  across  the  partition  boundaries,  much  less  differ¬ 
entiable. 

Maximum-likelihood  estimation  (MLE)  is  the  most 
commonly  used  technique  for  extracting  aircraft  pa¬ 
rameters  from  flight-test  data  [6.  7|.  The  MLE  tech¬ 
nique  postulates  a  parametric  model  (usually  linear, 
more  recently  nonlinear)  of  the  aircraft  and  adjusts  the 
parameters  to  minimize  the  negative  logarithm  of  the 
likelihood  that  the  model  output  fits  a  given  measure¬ 
ment  sequence.  For  flight-test  data,  individual  mod¬ 
els  are  fit  for  each  test  point  generating  estimates  of 
the  aircraft  stability  and  control  derivatives  and  uncer¬ 
tainty  levels.  Since  the  primaiy  use  of  the  deriratives  is 
verification  of  predicted  performance,  little  more  than 
fairing  is  done  to  generate  global  models  through  the 
test  points  (6).  Iterative  minimization  procedures  are 
emploj'ed,  so  on-line  estimation  is  not  possible. 

Non-smooth  aerodynamic  models  can  be  coerced 
into  smooth  models  with  sufficient  post-processing, 
but  this  post-processing  may  reduce  the  fidelity  of 
the  model.  In  [2]  extensive  tabular  aerodynamic  data 
were  fitted  with  cubic  splines  before  applj-ing  NID  con¬ 
trol  techniques.  Cubic  splines  ensured  continuity  and 
smoothness  across  subspace  boundaries. 

An  on-line  sj'stem  identification  and  nonlinear  con¬ 
trol  paradigm  that  integrates  computational  neural 
networks  as  an  aerodynamic  modeler  wnthin  the  EBM 
framework  was  presented  in  (8,9).  Using  the  plant  mea¬ 
surements,  t,  an  extended  Kalman-Bucy  filter  (EKBF) 
[10]  estimates  the  state  of  an  augmented  plant  model 
consisting  of  the  standard  aircraft  state,  x(f ),  and  the 
aircraft  specific  forces  and  moments,  g(t).  The  neural 
network  learns  an  aerodynamic  model,  g(x,u),  from 
x(f),  g(f),  and  the  control,  u(t).  A  nonlinear  control 
law  using  the  estimated  state,  x(f),  and  the  aerody¬ 
namic  model,  g(x.  u),  completes  the  loop  and  forms 
an  adaptive,  nonlinear  control  system.  A  block  dia¬ 
gram  of  the  complete  controller  is  given  in  Fig.  1. 

Using  the  s»’<»e/force  EKBF  and  the  network  esti¬ 
mation  model,  excellent  matches  of  aerodynamic  coef¬ 
ficient  histories  were  achieved  with  a  simple  neural  net- 


Figure  1;  Integration  of  System  Identification  and 
Nonlinear  Control. 

work  model  for  single  flight  conditions,  and  the  corre¬ 
sponding  functional  relationship  g(x,  u)  was  well  iden¬ 
tified  [9].  Expanding  the  number  of  training  flight  con¬ 
ditions  to  span  the  flight  envelope  continued  to  yield 
network  models  with  excellent  matches  of  the  coeffi¬ 
cient  histories:  however,  g(x,  u)  and  the  correspond¬ 
ing  aircraft  stability  and  control  deriratives  were  not 
equally  well  estimated  over  the  entire  space. 

This  paper  describes  a  method  of  improved  aerody¬ 
namic  modeling  that  augments  the  EBM-based  iden¬ 
tifier  and  neural  network  modeler  with  additional  in¬ 
formation  about  the  first  partial  derivatives  of  the 
aerodynamic  coefficients.  Including  both  function  and 
derivative  information  in  the  network  training  algo¬ 
rithm  constrains  and  directs  the  training  by  reducing 
the  number  of  possible  functions  that  match  a  par¬ 
ticular  measurement  sequence.  Considerable  gain  is 
achieved  in  network  accuracy  with  these  methods,  and 
representation  of  aerodynamic  derivatives  is  improved 
substantially. 


W  and  b  are  the  weights  and  biases  of  the  indicated 
layers,  and  jr[-]  is  a  vector-valued  function  composed 
of  individual  node  activation  functions: 

^■[q]  =  K(9i)---<^n(9„)r  (4) 

The  outputs  of  layer  1:  —  1  are  connected  to  the  inputs 
of  layer  k  successively  such  that  the  final  form  of  an 
-layer  network  is 

. . . 

4,(1  )  +  •••))  (5) 

Defining  r  =  r*®'  as  the  input  vector  and  s  =  as 
the  output  vector, 

*  =  h(r:w)  (6) 

where  w  is  a  vector  combining  all  the  weights  and 
biases  of  all  the  network  layers  and  h(-)  is  the  overall 
function  of  r  and  w.  A  sample  two-input/one-output, 
one  hidden-layer  network  is  given  in  Fig.  2. 


Figure  2;  A  simple  feedforward  network. 


Computational  Neural  Networks 

Computational  neural  networks  are  biologically-in- 
spired,  massively  parallel  computational  paradigms. 
Computational  neural  networks  are  used  in  a  variety 
of  applications  because  they  are  adaptive,  they  learn 
from  examples,  and  they  can  provide  excellent  func¬ 
tion  approximation. 

Multilayer  feedforward  neural  networks  combine 
simple  nonlinear  computational  elements  into  a  lay¬ 
ered  architecture  to  compute  a  static  nonlinear  func¬ 
tion.  Individual  network  nodes,  with  specified  non¬ 
linear  or  activation  functions,  are  grouped  together  in 
distinct  layers,  with  multiple  layers  connected  for  the 
full  network.  the  output  of  the  fc***  layer  of  the 
network,  is  computed  as  the  nonlinear  transformation 
of  the  weighted  sum  of  the  outputs  of  the  (k  -  1)*‘ 
layer  and  a  bias: 

,(*)  s  4,(t)  +  b<*'>’]  (3) 
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Smoothness  of  a  network  function  is  important  if  the 
network  is  to  be  used  in  the  suggested  nonlinear  con¬ 
trol  paradigms.  The  order  of  differentiability  of  eq.  6, 
by  its  construction,  is  the  same  as  that  of  the  node  ac¬ 
tivation  functions,  (t(-).  Assuming  all  activation  func¬ 
tions  are  at  least  once  differentiable,  the  derivative  of 
the  network  with  respect  to  its  inputs  is 


_  =  — - _ W(0)  /yv 

The  derivatives  of  the  activation  function  vectors  are _ 

diagonal  matrices;  'or 


0 


(8) 


□ 
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The  logistic  sigmoid  is  a  commonly  used  node  activs' 
tion  function:  ,/ 

0(9)  = 


H-c-« 


'-y  Codes 


Ul3t 


end/or 

dolal 


It  is  infinitely  differentiable,  and  its  first  derivative  is 
quite  simply  calculated  as 

^  =  a'  =  «r(9)[l-«T{9)]  (10) 

A  network  composed  of  logistic  sigmoid  activ’ation 
functions,  weights,  and  biases  is  infinitely  differen¬ 
tiable. 

Layered  feedforward  neural  networks  are  closely  re¬ 
lated  to  basis  function  approximation  and  regulariza¬ 
tion  theory  (11).  In  principle,  any  continuous  func¬ 
tion  [12]  or  n**'-order  differentiable  function  [13]  can 
be  approximated  accurately  by  such  networks;  how¬ 
ever,  there  is  no  guarantee  that  the  desired  network 
size  can  be  chosen  and  weights  learned  from  a  set  of 
(possibly  noisy)  training  examples. 

Given  a  set  of  Np  training  examples,  {ri,d,},  wi.ere 
d,  =  d(ri)  is  value  of  d(  ),  the  function  to  be  approx¬ 
imated.  for  a  given  input,  Ti,  a  suitable  least-squares 
cost  function  is 

A’, 

J  =  '^eje,  (11) 

1=1 

where  the  network  error  vector  for  the  i‘*'  trial,  c,,  is 


The  extended  Kalman  filter  is  a  particularly  attrac¬ 
tive  alternative  method  that  transforms  the  optimiza¬ 
tion  of  eq.  11  and  training  of  eq.  6  into  the  estimation 
of  a  dynamic  sjstem  [18].  A  localized  version  of  the 
extended  Kalman  filter  algorithm  also  has  been  devel¬ 
oped  [19].  Our  version  of  the  extended  Kalman  filter 
follows  the  first  approach,  as  described  below. 

Extended  Kalman  Filter  for  Network  Training 

The  state  and  measurement  vectors  of  a  nonlinear, 
discrete-time  system  are  given  by 

xjr+i  =  ^{Xi)^-n^  (13) 

=  h{xk)  +  yk  (14) 

where  x,n  €  /?",*,  v  e  i?"*,  and  d>(  )  and  h(  )  are  gen¬ 
eral  nonlinear  functions  in  i?"  and  {n*}  and  {v*} 
are  zero  mean,  white  Gaussian  processes  with  co\’ari- 
ances  Q*  and  R*.  Xo  is  a  Gaussian  random  variable 
with  mean,  Xq.  and  covariance.  Po- 

Assuming  <(>(•)  and  h(  )  are  sufficiently  smooth,  an 
extended  Kalman  filter  can  be  defined  for  the  system, 
eq.  13  and  14  (see,  e.g.,  [10,20]).  The  filter  equations 
are 


c,  =  d.  -  h(r,,w)  (12) 

After  selecting  the  number  of  la3'ers  and  the  number 
of  nodes  in  each  laj'er,  J  is  minimized  with  respect  to 
w,  giving  the  minimum  mean-squared-error  network. 

Searching  for  the  weight  vector  that  minimizes  eq.  11 
may  be  a  difficult  nonlinear  optimization  problem. 
The  problem  is  often  constrained  by  the  desire  to  keep 
the  weight  adjustment  algorithm  localized,  that  is,  to 
update  the  weights  associated  with  a  node  only  us¬ 
ing  information  a\’aiiable  at  that  node.  Localized  al¬ 
gorithms  simplify  hardware  implementation  although 
they  may  ignore  coupling  information  that  is  impor¬ 
tant  to  training. 

Traditional  nonlinear  optimization  algorithms  in¬ 
cluding  steepest  descent,  conjugate  gradients,  and  \-ar- 
ious  Newton-type  algorithms,  have  been  applied  to 
feedforward  computational  neural  networks  [14-17]. 
Back-propagation,  a  steepest-descent  algorithm,  is  the 
most  common  algorithm  [14].  Modifications  and  im¬ 
provements  to  the  back-propagation  algorithm,  such 
as  learning  momentum,  have  been  developed  [14-16]. 

Conjugate  gradient  methods  have  been  successfully 
applied  to  neural  networks  (see,  e.g.,  [15]  for  a  de¬ 
scription  of  the  method).  A  localized  version  of  the 
Marquardt-Levenberg  leut-squares  optimization  tech¬ 
nique  is  developed  in  [17].  The  Marquardt-Levenberg 
algorithm  progresses  from  a  steepest-descent  algo¬ 
rithm  to  a  quasi-Newton  algorithm  as  optimization 
proceeds.  Limited  testing  of  the  basic  Marquardt- 
Levenberg  algorithm  for  a  simple  network  show’ed  no 
significant  benefit  in  the  present  application. 


=  f{x* 

-i) 

(15) 

=  ♦*- 

iP.-i*Li 

+  Q* 

(16) 

Kk 

=  Pi"’ 

'Hi  (h^P^- 

-^Hl  +  RkY' 

(17) 

X* 

= 

-J-Ki-  |zit  - 

-h[xl->]} 

(18) 

P* 

=  (I- 

K*H*)PL‘ 

’(I-KtHO’’ 

+KtRiKj 

(19) 

The  filter  is  initialized  with  Xo  =  Xo  (the  mean)  and 
Po-  The  superscript  (-)  indicates  an  intermediate 
value  after  propagation  but  before  updating  with  new 
information.  Fjt  and  are  the  state  and  measure¬ 
ment  Jacobian  matrices: 


♦* 

H* 


dx 

ah 

ax 


Ixt 

xl-' 


(20) 

(21) 


A  sj'mroetric  form  is  used  for  the  cosMance  update 
(eq.  19)  ensuring  that  Pk  remains  symmetric  and  pos¬ 
itive  definite  as  long  as  is  positive-definite  and 
is  positive-semidefinite  at  the  expense  of  more  compu¬ 
tation  [10]. 

The  extended  Kalman  filter  is  derived  from  the  lin¬ 
ear  Kalman  filter  unth  the  assumption  that  nonlin¬ 
ear  effects  are  modeled  adequately  by  the  propagation 
of  the  mean  (eq.  15),  while  disturbances  and  mea¬ 
surement  errors  are  small  enough  to  allow  linear  es¬ 
timates  of  covariance  evolution  and  measurement  up- 
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date  (eq.  16-19).  The  EKF  is  neither  linear  nor  op¬ 
timal,  but  if  the  system  is  sufiBciently  well  behaved, 
good  results  are  obtained. 


Modeling  the  Network  for  Kalman  Filter  TVain- 
ing 


Identifying  the  network  weight  vector  w  as  the  state 
vector  X  in  eq.  13.  a  neural  network  can  be  expressed 
as  a  nonlinear  dynamic  system: 

w*  =  wt_i-fn*_i  (22) 

2k  -  h{wk,rk)-¥vk  (23) 


With  the  process  and  measurement  noise  sequences, 
{n*}  and  {t’*},  as  defined  above,  eq.  22  and  23  have 
the  form  of  eq.  13  and  14,  where  ^(x)  =  x  and  h  is 
defined  by  the  network  forward  propagation  (eq.  6). 
For  simplicity,  the  network  is  assumed  to  have  one 
output. 

Given  a  sequence  of  noise  corrupted  measurements, 
{;*},  and  the  corresponding  network  input  sequence, 
{r^},  the  weight  vector,  w,  is  estimated  by  an  ex¬ 
tended  Kalman  filter.  For  eq.  22,  the  state  Jacobian 
matrix.  Th.  is  an  identity  matrix  for  all  k.  This  makes 
the  covariance  propagation  step,  eq.  16,  very  simple. 
Computation  of  the  measurement  Jacobian  matrix. 


H* 


dw 


(w*,r.) 


(24) 


is  a  straightforward  application  of  the  chain  rule,  given 
the  differentiability  of  the  network  nonlinearities  and 
the  form  of  eq.  6.  Hfc  has  dimension  1  x  jS’u.,  where 
Nu  is  the  number  of  weights  in  the  network. 

The  propagation  of  w/t  (eq.  22)  is  modeled  as  a  dy¬ 
namic  system  driven  by  a  white  Gaussian  process  al¬ 
though  w  is  an  internal  variable  that  is  not  subject 
to  noise.  The  co^•ariance,  Q.  of  the  pseudo-noise  se¬ 
quence,  {n*},  controls  the  convergence  of  the  state 
co^•a^iance,  P*.  In  an  undriven  system  (Q  =  0).  the 
covariance  matrix  converges  to  zero  [21].  This  prevents 
further  state  updating  since,  by  eq.  17,  Kj^  is  zero  if 
P*  =  0. 

The  order  of  an  extended  Kalman  filter  used  as  a 
network  training  algorithm  grows  as  the  length  of  the 
network  weight  vector.  The  largest  network  used  in 
this  paper  is  a  single  hidden-layer  network  with  9  in¬ 
puts,  20  hidden  nodes,  and  one  output,  resulting  in 
221  weights  and  biases.  Ordinarily,  a  Kalman  filter  of 
this  size  would  be  difficult  to  compute;  however,  the 
matrix  to  be  inverted  in  eq.  17  has  dimension  m  x  m, 
where  m  is  the  output  dimension.  As  m=l,  the  matrix 
inversion  is  simply  a  scalar  division. 


Incorporating  Partial  Derivatives  in  Network 
lyaining 

For  a  scalar  function  </(•),  the  network  training  error 
(eq.  12)  is  scalar  and  the  cost  (eq.  11)  is  based  only  on 


errors  in  the  function  \'alue;  however,  useful  training 
information  also  is  contained  in  the  error  between  de¬ 
sired  and  actued  slopes.  If,  in  addition  to  the  training 
data  {r,,  d,  },  measurements  of  the  first  partial  deri\’a- 
tives  of  d(-)  e\-aluated  at  r^. 


ad 

ar 


r. 


(25) 


are  available,  a  new  weighted  least-squares  cost  func¬ 
tion  can  be  defined: 


J  =  ^ejR-U,  (26) 

•  1=1 

where  R“^  is  an  appropriately  dimensioned  weighting 
matrix.  C;  is  formed  by  augmenting  the  original  scalar 
error,  e,,  such  that 


Cl  = 


d,  -  />  (w.r,) 
^(r.)-|f  (w.r,) 


(27) 


In  eq.  26,  R“’  weights  the  relative  importance  of  each 
element  of  c,,  that  is.  the  importance  of  errors  in  the 
fit  of  the  function,  h,  compared  to  errors  in  the  fit  of 
each  of  the  derix'atives,  dh/drj. 

Although  any  of  the  previously  mentioned  network 
training  algorithms  could  be  reformulated  based  on 
the  function-derivative  cost,  the  EKF  training  method 
is  extended  here.  The  propagation  of  w*  (eq.  22) 
remains  the  same,  while  the  measurement  equation 
(eq.  23)  is  expanded  to  include  the  partial  deri\atives: 


Zk  = 


h  (w*.r*) 

I?  (w*.r*) 


+  V* 


(28) 


dh/dr  is  calculated  using  eq.  7.  With  this  measure¬ 
ment  equation,  the  measurement  Jacobian  matrix  be¬ 
comes 


(w»,r*) 


(29) 


This  expanded  matrix  Ht  has  dimension  (A’^j+l)x  A'.,, 
where  Ni  is  the  number  of  network  inputs. 

In  the  EKF  formulation,  the  covariance,  R,  of  the 
measurement  noise  sequence,  v^,  is  the  same  as  R  in 
the  weighted  least-squares  cost  function  (eq.  26).  It 
adjusts  the  relative  importance  of  each  of  the  mea¬ 
surements  during  network  training.  Equivalent!}',  it 
weights  the  relative  accuracy  of  each  of  the  measure¬ 
ments.  While  containing  more  information  about  the 
function,  the  partial  derivatives  may  also  contain  more 
errors.  Thus,  a  suitable  R  is  important  for  accurate 
network  learning. 


4 


IVaining  Example 

A  simple  example  demonstrates  the  added  benefits  of 
function-derh-ative  training  (FD-training)  over  a  stan¬ 
dard  function  training  (F-training)  method.  This  ex¬ 
ample  exercises  only  the  aerodynamic  model  block  in 
Fig.  1  using  random  data  rather  than  the  dynamic  data 
that  would  be  produced  by  simulated  or  actual  flight 
tests.  The  demonstration  function  is  a  lift  coefficient 
curve  assumed  to  be 

Ci(a,q,6£)  =  Cist(o)  +  ^"1,4  +  (30) 

where  o  is  the  angle  of  attack,  g  =  gcf2V  is  the 
non-dimensionalized  pitch  rate,  and  6e  is  the  elevator 
deflection  angle.  Realistic  numerical  values  are  cho¬ 
sen  for  the  aerodynamic  deri\atives  based  on  a  twin- 
jet  transport  aircraft  [22].  The  pitch-rate  derhative, 
Cl,{=  dCi/dq)  is  7.0  and  the  elevator-angle  deriva¬ 
tive,  Ci,^(=  dCi/ddE),  i.s  0.006  deg“‘.  The  angle- 
of-attack  contribution,  Cijt(q),  is  a  linear  function 
for  low  angles  of  attack  and  a  quadratic  function  for 
higher  angles  of  attack.  It  is  modeled  as 


Equation  32  represents  q  variations  of  ±20®/sec  at 
the  given  velocity,  V.  The  lift  coefficient,  Ci,  is 
calculated  from  eq.  30  and  is  corrupted  with  zero- 
mean  Gaussian  noise  with  a  variance  of  0.1.  The 
first  network  is  trained  using  only  function  informa¬ 
tion  (F-training).  The  second  network  is  trained 
with  the  function-derivative  method  using  additional 
measurements  of  dCi/da,  dCi/dq,  and  dCildbE 
(FD-training).  These  measurements  are  also  cor¬ 
rupted  with  zero-mean  Gaussian  noise  with  covari¬ 
ances,  0.5,  6.0  X  10“^,  and  0.7.  The  initial  state  co- 
variance,  Po,  is  10^1  and  the  constant  process  noise 
covariance,  Q.  is  10“®I,  where  I  is  the  appropriately 
sized  identity  matrix.  The  measurement  noise  covari¬ 
ance  matrices  are  identical  to  the  simulated  noise  used 
in  the  example.  Thus,  for  F-training,  R  =  0.1,  while 
for  FD-training 


0.1  0  0  0 

0  0.5  0  0 

0  0  6.0  X  10"^  0 

0  0  0  0.7 


(34) 


f  0.0952Q -(-0.1048  a  <  9' 

\  -0.0095q2  +  0.2667Q  -  0.6667  o  >  9° 

The  restriction  of  Cl  to  a  and  6e  inputs  is  plotted  in 
Fig.  3. 


Figure  3;  Example  Lift  Coefficient,  Cl,  for  Fixed  q. 

Using  eq.  30  as  the  desired  function,  d,  two  identi¬ 
cal  feedforward  networks  with  three  inputs,  one  hidden 
layer  of  10  sigmoidal  nodes,  and  one  output,  are  ini¬ 
tialized  with  the  same  random  weights  chosen  from  a 
zero-mean  Gaussian  distribution  with  variance  of  0.1. 
The  training  data  are  selected  by  randomly  choosing 
1500  points  from  a  uniform  distribution  bounded  by 


a  € 

(31) 

9  e 

(-0.005,0.005] 

(32) 

6e  € 

(-20%  20*1 

(33) 

A  comparison  of  the  variation  of  eq.  30  and  its  three 
partial  derivatives  to  the  two  network  approximations 
of  this  function  is  given  in  Fig.  4  for  angle-of-attack 
variations.  These  plots  represent  one-dimensional 
slices  through  the  fully  trained  three-dimensional  in¬ 
put  space.  The  FD-training  is  better  for  the. func¬ 
tion,  and  it  is  much  better  for  the  derivatives.  The 
rms  errors  of  the  function  and  derivatives  for  the 
FD-trained  function  are  all  considerably  smaller  than 
the  F-trained  function. 

The  results  also  indicate  the  possibility  of  faster 
learning  as  measured  by  the  number  of  training  points 
necessary  for  accurate  modeling.  The  FD-trained  net¬ 
work  quickly  learned  a  representation  of  the  desired 
function.  The  total  rms  error  between  the  function 
and  its  three  derivatives  and  the  network  model  ap¬ 
proached  its  minimum  value  after  the  presentation  of 
about  500  data  points  and  remained  relatively  un¬ 
changed  for  the  remaining  points.  The  F-trained  net¬ 
work  required  the  full  1500  points  to  reach  a  level  of 
accuracy  less  than  that  of  the  FD-trained  network. 

Including  partial  derivativ'es  in  the  network  training 
process  substantially  improves  the  ability  of  the  train¬ 
ing  process  to  find  an  accurate  model  of  the  desired 
function  and  its  derivatives. 

Estimating  Aerodynamic  Coefficient 
and  Derivative  Histories 

The  benefits  of  incorporating  gradient  information 
from  the  training  function  into  neural  network  learn¬ 
ing  are  well  demonstrated  in  the  previous  section.  Af¬ 
ter  a  brief  summarv*  of  the  extended  Kalman-Bucy  fil¬ 
ter  equations,  the  EKBF  that  estimates  the  aircraft 
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Figure  4:  Network  Function  and  Derivative  Comparison  at  g  =  0,  =  0. 


aerodynamic  coefficient  histories  in  addition  to  state 
histories  is  described.  Subsequently,  an  extension  of 
the  state/force  EKBF  is  developed  that  estimates  the 
desired  partial  deri^'ative  histories  in  addition  to  the 
coefficient  histories. 

Extended  Kalman-Bucy  Filter  for  State  and 
Force  Estimation 

The  state  and  measurement  equations  for  a  combined 
continuous  state/sampled-data  measurement  system 
are: 

x(0  =  f(x(t),n(<),f]  (35) 

Xi  =  h[x(f*),<*]  +  v*  (36) 

where  x  €  Jf",  *,v  €  /?”,  n  €  /?*,  and  f(  )  and  h(  ) 
are  general  nonlinear  functions  in  H"  and  /f"*.  The 
process  noise,  n(t),  is  a  white,  zero-mean  Gaussian 
random  process  with  spectral  density  matrix,  Qc(0- 
The  measurement  noise  sequence,  {v*},  is  a  zero- 
mean,  Gaussian  random  process  with  covariance,  Rt. 
Xo  is  a  Gaussian  random  s’ariable  with  mean,  ia,  and 
cos'ariance,  Po- 


For  sufficiently  smooth  f(  )  and  h(  ),  a  hybrid  ex¬ 
tended  Kalman-Bucy  filter  can  be  defined  [10].  The 
hybrid  EKBF  uses  the  continuous  equations  for  the 
state  and  covariance  propagation  and  the  discrete 
equations  for  the  gain  and  measurement  update  cal¬ 
culations.  The  continuous  equations  are 

f[x(r),0,T]  dr  (37) 
p(-)|t*]  =  P[t*_,)-b  r  P(r)dr  (38) 

where 

P(r)  =  F(r)P(r)  -b  P(r)FT(r)  +  L(r)Qc(r)L’'(r) 

(39) 

The  gain  and  update  equations  (eq.  17-19)  are  the 
same  as  the  EKF  with  the  definitions  Pj”^  =  P'“l[tt] 
and  x^"*  =  x'”'[f*].  The  linearized  matrices  F,  L, 
and  H  are 

F(f)  =  ^  (40) 
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L(f) 

Hk 


dt(-) 

dn 

dhj) 

dx 


(41) 

(42) 


and  each  is  evaluated  at  the  current  value  of  x. 


Estimating  State  and  Force  Histories 

Follo^^-ing  a  procedure  analogous  to  Estimation  Before 
Modeling  [4,5],  the  system  is  represented  by 

x(0  =  f[x(0]  +  g[x(0>«(01  +  “(0  (43) 

z*  =  h[x{tt)]  +  v*  (44) 

in  which  f  (x)  is  known  without  error  and  g|x,  u]  is  sub¬ 
ject  to  uncertainty  or  change.  Let  g(f)  =  g[x(t),u(t)] 
for  some  period  of  system  operation.  Subject  to  nec¬ 
essary  observability  requirements,  an  augmented  state 
vector,  Xa(1).  consisting  of 


Xo{t)  = 


■  x(t)  ■ 

.  8(0  . 


(45) 


is  estimated  with  an  extended  Kalman-Bucy  filter.  If 
eq.  43  represents  the  6-degree-of-freedom  equations  of 
motion  of  a  rigid  aircraft  (see  [23]),  6  elements  of  g{f) 
are  the  specific  forces  and  moments  due  to  aerodj- 
namic,  thrust,  and  control  inputs.  The  remaining  ele¬ 
ments  of  g(t)  are  zero,  as  the  kinematic  relations  are 
independent  of  the  aerodynamic,  thrust,  and  control 
effects. 

In  [4. 5].  the  specific  forces  and  moments.  g(/),  are 
modeled  as  a  second-order,  integrated,  random-walk 
processes,  ensuring  continuity  and  smoothness  of  the 
estimated  histories.  The  equations  for  each  ^.(f)  are 


coefficient  (Cz)  histories  w’ere  demonstrated  in  [9].  De¬ 
ficiencies  in  the  representation  of  Cz(x,  u)  and  the 
corresponding  aircraft  stability  and  control  derivatives 
prompted  further  research  into  the  network  training 
process  and  resulted  in  the  FD-training  methodol- 
ogj'.  To  provide  the  needed  gradient  histories,  the 
state/force  estimator  must  be  augmented. 

Assuming  the  same  state  and  measurement  equa¬ 
tions,  eq.  43  and  44,  first  partial  derivatives  of  the 
forces  and  moments  are  estimated  by  retaining  some 
state  explicit  dependence  in  g(  ).  As  an  example, 
d%ldx\  is  estimated  in  the  following  manner.  Let 
g[a-i(f).f)]  =  g[x(f),u(f)]  for  some  period  of  system 
operation.  With  the  dependence  on  xi(f)  remaining, 
the  derivative  of  each  g,  is 

h  =  J- 

at  aj,  at 

=  fli.oi  +  (48) 


where  ij  is  obtained  from  eq.  43.  and  p,,o)  and  gi,\\  are 
the  unknown  partial  derivatives.  Following  the  previ¬ 
ous  development,  each  of  the  unknown  partial  deriva¬ 
tives  is  modeled  as  first-order,  integrated,  random- 
walk  processes.  For  each  element  of  g(f). 
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where  n;  are  zero-mean,  white  Gaussian  random  pro¬ 
cesses  and  9,,;  are  internal  state  elements.  Using  this 
model  of  g(t),  the  aircraft  state  dynamics  are  aug¬ 
mented  with  18  (=  6  forces/moments  x  3  equations) 
specific  force  and  moment  equations  such  that 

Xo(f)  =  Ci[xa(<)j  +  no(f)  (47) 

After  estimating  Xg,  the  6  forces  and  moments  can 
be  normalized  using  air  density,  airspeed,  reference 
area,  and  reference  length  to  the  non-dimensional  co¬ 
efficients:  Cxi  Cy,  Czi  Ciy  Cmi  *nd  Cs- 

Estimating  State,  Force,  and  Force-Gradient 
Histories 

Using  the  state/force  estimator  and  a  feedforward  neu¬ 
ral  network,  excellent  representations  of  normal  force 


w’here  n,,o  and  tj,.i  are  zero-mean,  white  Gaussian  ran¬ 
dom  processes  and  ,02  and  g,  u  internal  state 
elements. 

A  new  augmented  state  equation  (eq.  47)  is  formed 
using  the  original  aircraft  equations  of  motion  and 
eq.  49.  Subject  to  the  necessary'  (possibly  more  strin¬ 
gent)  observability  conditions,  the  augmented  state  is 
estimated  aith  an  extended  Kalman-Bucy  filter.  In¬ 
cluded  in  the  estimated  state,  Xa,  are  estimated  his¬ 
tories  of  the  original  state,  x(t),  of  the  forces  and  mo¬ 
ments,  g((),  and  of  the  partial  derivatives  of  the  forces 
and  moments  with  respect  to  the  state  Z] , 


The  partial  derivative  of  g  with  respect  to  any  other 
state  elements  can  also  be  estimated  at  the  expense  of 
additional  elements  in  the  augmented  state  vector,  Xg. 
Each  force/force-gradient  history*  requires  the  addition 
of  3-(-2/  state  elements,  where  I  is  the  number  of  partial 
derivatives  estimated. 
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Figure  5:  Twin-Jet  IVansport  Flight  Envelope  and 
Training  Trim  Points. 


Aerodynamic  Model  Identification  from 
Simulated  Flight-Test  Data 

The  aerodynamic  coefficient  identification  scheme  is 
demonstrated  by  generating  a  model  of  the  normal 
force  coefficient.  C^(x.  u),  of  a  twin-jet  transport  air¬ 
craft  based  on  simulated  flight-test  data.  This  demon¬ 
stration  exercises  the  plant,  the  state/force  EKBF,  and 
the  aerodynamic  model  (in  the  form  of  a  neural  net¬ 
work)  of  Fig.  1.  The  results  of  training  two  identi¬ 
cally  initialized  networks  with  the  EKF  F-training  and 
FD-training  methods  are  presented  and  compared.  A 
single  estimated  deris-ative  history,  is  included 

in  the  FD-training  method. 

Simulated  Flight-Test  and  Training  Data  Gen¬ 
eration 

Simulated  flight-test  data  are  generated  from  a  full, 
nonlinear,  6-degree-of-freedom  model  of  a  twin-jet 
transport.  Standard  rigid-body  aircraft  equations  of 
motion  are  used  [23].  The  simulation  contains  an  in¬ 
ternal  aerodynamic  model  based  on  extensive  tabular 
data  and  algebraic  constraints  [22].  Five  control  inputs 
drive  the  system:  throttle  position  (67-),  ele^'ato^  de¬ 
flection  aileron  deflection  (64),  rudder  deflection 
(6a),  and  flap  position  (6/-). 

The  static  training  data  are  generated  at  60  trim 
points  that  span  the  aircraft  operational  flight  en¬ 
velope  (Fig.  5).  Static  training  data  consist  of  the 
aircraft  state  and  all  of  the  require  aerodynamic  co¬ 
efficients  and  derivatives  for  the  trimmed  condition. 
Static  data  points  were  calculated  exactly,  as  no  esti¬ 
mation  was  required. 

Dynamic  training  data  are  generated  around  9  ad¬ 
ditional  trim  points.  These  points  are  numbered  in 
Fig.  5.  The  first  six  points  outline  the  basic  operat¬ 


ing  envelope,  and  the  remaining  three  points  are  dis¬ 
tributed  through  the  interior.  The  dynamic  data  are 
produced  using  the  nonlinear  aircraft  simulation  and 
the  state/force  EKBF.  At  each  point,  the  aircraft  is  ex¬ 
cited  by  a  specific  input,  measurements  are  recorded, 
and  an  EKBF  is  used  to  estimate  the  aircraft  states 
and  forces  needed  for  network  training. 

To  ensure  the  obser\'ability  required  by  the  Kalman- 
Bucy  filter,  an  extensive,  but  feasible  [5],  measurement 
vector  is  defined.  The  13  measured  variables  are  lin¬ 
ear  accelerations  (a^,  Uy,  a,),  angular  accelerations 
(fli,  flm,  o„),  angular  rates  (p.  q.  r),  total  velocity  (!'), 
angle  of  attack  (a),  angle  of  sideslip  (0),  and  altitude 
{h).  Process  and  measurement  noise  are  modeled  as 
zero-mean,  Gaussian  random  sequences  with  co>‘ari- 
ances  determined  from  [5]. 

Starting  from  a  specified  trim  condition,  the  aircraft 
is  excited  using  a  ±10"  “3211"  ele\-ator  input.  “3211" 
inputs  consist  of  alternating  steps  of  3.  2.  1,  and  1 
time-unit  widths.  With  appropriately  chosen  vidths, 
this  input  historj"  provides  a  sufficiently  rich  input  for 
good  estimation  of  aircraft  parameters  [24].  Twenty- 
second-long  measurement  histories  rapturing  all  of  the 
relevant  motion  are  stored  for  each  dynamic  training 
point. 

Using  the  state/forre  EKBF.  the  aircraft  state  and 
force  histories  are  estimated  from  the  given  measure¬ 
ment  histories.  Initial  state,  process  noise,  and  mea¬ 
surement  noise  cos’ariance  matrices  are  based  on  [5]. 
From  the  estimated  force  histories  the  estimated  nor¬ 
mal  force  coefficient  history,  Cz(f).  is  calculated  from 
the  estimated  density,  airspeed,  and  the  aircraft  wing 
area.  At  the  time  of  writing,  the  force-gradient  es¬ 
timation  algorithm  had  not  been  implemented.  To 
demonstrate  the  anticipated  benefits  of  the  function- 
deris'ative  training  algorithm.  Cz.  (t)  is  calculated 
from  the  tabulated  aerodynamic  data  using  a  finite- 
difference  scheme  and  the  currently  estimated  state, 
x(f). 

Neural  Network  Model  and  Training  Procedure 

A  9-input,  single-hidden-layer  network  structure  with 
20  logistic  sigmoid  nodes  in  the  hidden  layer  and  a  sin¬ 
gle  linear  node  in  the  output  layer  is  chosen.  The  9 
elements  of  the  input  vector  are  Mach  number,  angle 
of  attack,  pitch  rate,  density,  angle  of  attack  rate,  al¬ 
titude,  throttle  position,  elevator  deflection,  and  flap 
position: 


Cz  =  Cz(M,a.g,p,d,h,6T,6E,6r)  (50) 

This  9-20-1  network  has  221  adjustable  weights  and 
biases.  The  network  is  initialized  with  random  weights 
selected  from  a  zero-mean,  Gaussian  distribution  with 
a  variance  of  0.1. 

The  initial  state  and  process  noise  co\'ariance  matri¬ 
ces  for  the  two  training  algorithms  are  identical.  The 
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initial  state  cox-ariance,  Po,  is  10^1,  and  the  process 
noise  co\'ariance  matrix,  Q.  is  The  large  initial 

state  covariance  indicates  that  the  initial  weights  are 
unknown.  The  small  process  noise  co\'ariance  allows 
the  weights  to  converge  but  prevents  the  state  covari¬ 
ance  from  converging  to  zero. 

Training  of  the  networks  proceeds  in  an  identical 
manner  for  each  of  the  algorithms: 

1.  Iterate  50  times  through  each  of  the  60  static 
training  points  to  initialize  the  network  in  the 
neighborhood  of  the  desired  function. 

2.  Present  the  first  dj'namic  training  historj'. 

3.  Present  the  60  static  training  points. 

4.  Repeat  steps  2  and  3  for  the  nex,  dynamic  train¬ 
ing  point. 

5.  Repeat  steps  2  to  4  until  the  desired  convergence 
is  achieved. 

For  the  results  presented  below,  12  iterations  through 
the  9  dynamic  training  points  were  conducted.  Since 
each  history  is  20  sec  long,  a  total  of  2160  sec  (36  min) 
of  simulated  flight  data  (sampled  at  0.1-sec  intei^'als) 
plus  9480  static  training  points  were  presented  to  each 
network. 

Coefficient  Identification 

Using  the  procedure  described  above,  a  neural  network 
model  of  Cz(x,u)  is  developed  with  the  F-training 
method.  The  orocess  noise  matrix.  R.  is  0.1  for  this 
scalar  example.  After  training,  the  network  model  pre¬ 
cisely  matches  the  coefficient  histories  at  each  of  the 
dynamic  training  points.  The  actual  and  network  es¬ 
timated  Cz  histories  are  given  in  Fig.  6  for  dynamic 
training  point  #9.  The  histories  are  indistinguishable. 
The  other  8  histories  are  equally  well  matched.  It 
must  be  emphasized  that  these  excellent  results  are 
obtained  for  large  maneuvers  (angles  of  attack  rang¬ 
ing  from  —5°  to  well  over  the  stall  angle  of  attack  of 
ft;  16°)  at  9  training  points  covering  the  entire  flight 
envelope  from  stall  speeds  at  sea  level  (Point  #1)  to 
Mach  0.85  at  37,000  feet  (Point  #4). 

For  use  in  nonlinear  control  laws,  the  network  esti¬ 
mate  of  ^z(x,  u)  is  more  important  than  the  estimate 
of  Cz{t).  The  network  value  of  Cz{q)  is  plotted  in 
Fig.  7  based  on  dynamic  training  point  #9.  The  inter¬ 
section  of  the  two  curves  occurs  at  the  trim  condition, 
but  nowhere  else  does  the  model  fit  well.  Although 
generating  excellent  histories,  the  network  has  con¬ 
verged  to  an  inadequate  representation  of  the  under- 
Ijing  aerodynamic  coefficient  model.  Equally  poor  re¬ 
sults  are  found  at  all  of  the  other  training  points.  The 
network  has  apparently  assigned  false  dependences  of 
Cz  on  other  network  inputs  that  are  closely  correlated 
aith  Q(f)  for  the  training  maneuvers. 


Coefficient  and  Derivative  Identification 

Using  the  procedure  outlined  above  and  an  FD- 
learning  algorithm,  a  second  network  is  developed  to 
model  Cz(x.u).  In  addition  to  Cz{t},  a  single  par¬ 
tial  derirative  history,  Cz^(t),  is  used  during  training. 
The  process  noise  covariance  matrix  is 


The  larger  R(2, 2)  element  means  that  the  derisative 
history  is  assumed  to  be  noisier  and  less  reliable  for 
training.  As  this  element  gets  larger,  the  results  ap¬ 
proach  that  of  the  F-trained  network,  and  the  deri^•a- 
tive  information  is  ignored.  Cz(t)  and  Cz(q)  plots  for 
the  fully  trained  network  at  dynamic  training  point  #9 
are  given  in  Figs.  8  and  9.  Th<*  Cz(t)  history  is  not 
as  well  matched  for  the  FD-trained  network  as  if  was 
for  the  F-trained  network.  The  Cz(o)  match  is  much 
better.  The  additional  training  information  forces  an 
excellent  match  of  the  slope  in  the  training  regions 
arour  ^  the  initial  trim  condition  of  5°  angle  of  attack. 
The  C^(q)  fit  deteriorates  in  the  region  above  approx¬ 
imately  10°,  where  the  training  set  contained  little 
data.  This  mismatch  gives  rise  to  the  mismatched  his¬ 
tories  around  5  and  8  seconds  in  Fig.  8,  where  the 
angle  of  attack  is  outside  of  the  range  (—5°,  10°).  Ad¬ 
ditional  training  data  in  the  higher  angle-of-attack  re¬ 
gions  should  make  the  fit  better  in  those  regions. 

Discussion 

While  present  results  arc  promising,  the  neural  net¬ 
work  aerodynamic  model  was  trained  and  tested  in  a 
limited  setting.  Many  problems  relating  to  tradition¬ 
ally  hard  system  identification  questions  remain  to  be 
addressed  before  any  fin;'  judgments  can  be  made.  For 
example,  the  inputs  to  the  plant  must  be  rich  enough 
for  the  identifier  to  extract  the  best  estimate  of  the 
true  model  from  the  data.  In  the  examples  above, 
the  “3211"  input  provides  good  excitation  of  the  two 
dominant  longitudinal  aircraft  modes,  but  it  probably 
is  not  rich  enough  to  identify  a  complete  aerodynamic 
model. 

It  clearly  is  important  not  only  to  span  the  space 
over  which  the  aerodynamic  model  is  to  be  defined  but 
to  minimize  correlations  in  the  network  input  histories. 
Operating  points  and  maneuvers  must  be  chosen  to 
promote  orthogonality  of  the  network  inputs. 

The  state/force/force-derivative  EKBF  is  not  yet 
implemented.  Questions  relating  to  observability’  of 
the  various  derivatives  are  likely  to  arise.  It  should  be 
possible  to  extract  the  dominant  derivati^’es,  such  as 
Cz.i  vith  sufficient  accuracy  to  aid  the  neural  network 
training  process. 
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Figure  6:  Normal  Force  Coefficient  History  using 
Function  Training  (Mach  0.53,  30,000  ft.). 


Time.  I.  tec 


Figure  8:  Norma'  Force  Coefficient  Histoiy  using 
Function-Derivative  Training  (Mach  0.53,  30, 0(  ft.). 

Conclusion 

Accurate  aerodynamic  coefficient  models  are  derived 
from  simulated  flight-test  data  using  a  system  iden¬ 
tification  model  composed  of  an  extended  Kalman- 
Bucy  filter  for  state  and  force  estimation  and  a  com¬ 
putational  neural  network  for  aerodynamic  modeling. 
An  extended  Kalman  filter  network  training  algorithm 
based  on  function  error  alone  is  shown  to  produce  an 
exceUent  force-history  match,  though  the  functional 
dependence  of  force  on  specific  inputs  is  not  well  iden¬ 
tified.  Including  information  about  derivatives  of  the 
function  adth  respect  to  its  inputs  greatly  improves  the 
functional  fit.  Networks  are  a'ell-trained  using  static 
input  data  uniformly  distributed  through  the  input 
space  and  using  dynamic  input  data  generated  from 
simulated  flight  tests.  Good  functional  fits  were  ob¬ 


Angle  of  Anack,  a.  deg 


Figure  7;  Normal  Force  Curve  using  Function  IVaining 
(Mach  0.53,  30.000  ft.). 


Angle  of  Attack,  a.  deg 


Figure  9:  Normal  Force  Curve  using  Function-Deriva¬ 
tive  Training  (Mach  0.53.  30,000  ft.). 

tained  a’ith  both  static  and  dynamic  inputs  in  the  re¬ 
gions  of  available  training  data.  Dynamic  maneux^rs 
that  span  the  input  space  and  minimize  correlations  of 
the  inputs  must  be  defined  for  effective  aerodjmamic 
modeling  using  computational  neural  networks. 
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